Artificial gene synthesis

Artificial gene synthesis is the process of synthesizing a gene in vitro without the need for initial template DNA samples. The main method is currently by oligonucleotide synthesis (also used for other applications) from digital genetic sequences and subsequent annealing of the resultant fragments. In contrast, natural DNA replication requires existing DNA templates for synthesizing new DNA.

Synthesis of the first complete gene, a yeast tRNA, was demonstrated by Har Gobind Khorana and coworkers in 1972.[1] Synthesis of the first peptide- and protein-coding genes was performed in the laboratories of Herbert Boyer and Alexander Markham, respectively.[2][3]

Commercial gene synthesis services are now available from numerous companies worldwide, some of which have built their business model around this task.[4] Current gene synthesis approaches are most often based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized "de novo", without the need for precursor template DNA. Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy and molecular engineering. The synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures. The market for gene synthesis was growing constantly over the past years. Experts estimated its volume to 40 Mio US-$ by the end of 2007.

Contents

Gene Optimization

While the ability to make increasingly long stretches of DNA efficiently and at lower prices is a technological driver of this field, increasingly attention is being focused on improving the design of genes for specific purposes. Early in the genome sequencing era, gene synthesis was used as an (expensive) source of cDNA's that were predicted by genomic or partial cDNA information but were difficult to clone. As higher quality sources of sequence verified cloned cDNA have become available, this practice has become less urgent.

Producing large amounts of protein from gene sequences (or at least the protein coding regions of genes, the open reading frame) found in nature can sometimes prove difficult and is a problem of sufficient impact that scientific conferences have been devoted to the topic.[5][6] Many of the most interesting proteins sought by molecular biologist are normally regulated to be expressed in very low amounts in wild type cells. Redesigning these genes offers a means to improve gene expression in many cases. Rewriting the open reading frame is possible because of the degeneracy of the genetic code. Thus it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The available number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids there are over 10150 codon combinations that will encode an identical protein. Using optimization methods such as replacing rarely used codons with more common codons sometimes have a dramatic effects. Further optimizations such as removing RNA secondary structures can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons corresponding to tRNA's that retain amino acid charging during starvation.[7] Computer programs are written to perform these, and other simultaneous optimizations are used to handle the enormous complexity of the task.[8] A well optimized gene can improve protein expression 2 to 10 fold, and in some cases more than 100 fold improvements have been reported. Because of the large numbers of nucleotide changes made to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.

Standard Methods

Chemical synthesis of oligonucleotides

Oligonucleotides are chemically synthesized using nucleotides, called phosphoramidites, normal nucleotides which have protection groups: preventing amine, hydroxyl groups and phosphate groups interacting incorrectly. One phosphoramidite is added at a time, the product's 5' phosphate is deprotected and a new base is added and so on (backwards), at the end, all the protection groups are removed. Nevertheless, being a chemical process, several incorrect interactions occur leading to some defective products. The longer the oligonucleotide sequence that is being synthesized, the more defects there are, thus this process is only practical for producing short sequences of nucleotides. The current practical limit is about 200 bp for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC can be used to isolate products with the proper sequence. Meanwhile a large number of oligos can be synthesized in parallel on gene chips. For optimal performance in subsequent gene synthesis procedures they should be prepared individually and in larger scales.

Annealing based connection of oligonucleotides

Usually, a set of individually designed oligonucleotides is made on automated solid-phase synthesizers, purified and then connected by specific annealing and standard ligation or polymerase reactions. To improve specificity of oligonucleotide annealing, the synthesis step relies on a set of thermostable DNA ligase and polymerase enzymes. To date, several methods for gene synthesis have been described, such as the ligation of phosphorylated overlapping oligonucleotides,[1][2] the Fok I method[3] and a modified form of ligase chain reaction for gene synthesis. Additionally, several PCR assembly approaches have been described.[9] They usually employ oligonucleotides of 40-50 nt long that overlap each other. These oligonucleotides are designed to cover most of the sequence of both strands, and the full-length molecule is generated progressively by overlap extension (OE) PCR,[9] thermodynamically balanced inside-out (TBIO) PCR[10] or combined approaches.[11] The most commonly synthesized genes range in size from 600 to 1,200 bp.although much longer genes have been made by connecting previously assembled fragments of under 1,000 bp. In this size range it is necessary to test several candidate clones confirming the sequence of the cloned synthetic gene by automated sequencing methods.

Limitations

Moreover, because the assembly of the full-length gene product relies on the efficient and specific alignment of long single stranded oligonucleotides, critical parameters for synthesis success include extended sequence regions comprising secondary structures caused by inverted repeats, extraordinary high or low GC-content, or repetitive structures. Usually these segments of a particular gene can only be synthesized by splitting the procedure into several consecutive steps and a final assembly of shorter sub-sequences, which in turn leads to a significant increase in time and labor needed for its production. The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For these annealing based gene synthesis protocols, the quality of the product is directly and exponentially dependent on the correctness of the employed oligonucleotides. Alternatively, after performing gene synthesis with oligos of lower quality, more effort must be made in downstream quality assurance during clone analysis, which is usually done by time-consuming standard cloning and sequencing procedures. Another problem associated with all current gene synthesis methods is the high frequency of sequence errors because of the usage of chemically synthesized oligonucleotides. The error frequency increases with longer oligonucleotides, and as a consequence the percentage of correct product decreases dramatically as more oligonucleotides are used. The mutation problem could be solved by shorter oligonucleotides used to assemble the gene. However, all annealing based assembly methods require the primers to be mixed together in one tube. In this case, shorter overlaps do not always allow precise and specific annealing of complementary primers, resulting in the inhibition of full length product formation. Manual design of oligonucleotides is a laborious procedure and does not guarantee the successful synthesis of the desired gene. For optimal performance of almost all annealing based methods, the melting temperatures of the overlapping regions are supposed to be similar for all oligonucleotides. The necessary primer optimization should be performed using specialized oligonucleotide design programs. Several solutions for automated primer design for gene synthesis have been presented so far.[12][13]

Error correction procedures

To overcome problems associated with oligonucleotide quality several elaborate strategies have been developed, employing either separately prepared fishing oligonucleotides,[14] mismatch binding enzymes of the mutS family[15] or specific endonucleases from bacteria or phages.[16] Nevertheless, all these strategies increase time and costs for gene synthesis based on the annealing of chemically synthesized oligonucleotides.

Increasingly, genes are ordered in sets including functionally related genes or multiple sequence variants on a single gene. Virtually all of the therapeutic proteins in development, such as monoclonal antibodies, are optimized by testing many gene variants for improved function or expression.

Applications

Major applications of synthetic genes include synthesis of DNA sequences identified by high throughput sequencing but never cloned into plasmids and the ability to safely obtain genes for vaccine research without the need to grow the full pathogens. Digital manipulation of digital genetic code before synthesis into DNA can be used to optimize protein expression in a particular host, or remove non-functional segments in order to facilitate further replication of the DNA.

Entire genomes

Synthia and Mycoplasma laboratorium

On June 28, 2007, a team at the J. Craig Venter Institute published an article in Science Express, saying that they had successfully transplanted the natural DNA from a Mycoplasma mycoides bacterium into a Mycoplasma capricolum cell, creating a bacterium which behaved like a M. mycoides.[17]

On Oct 6, 2007, Craig Venter announced in an interview with UK's The Guardian newspaper that the same team had synthesized a modified version of the single chromosome of Mycoplasma genitalium using chemicals. The chromosome was modified to eliminate all genes which tests in live bacteria had shown to be unnecessary. The next planned step in this minimal genome project is to transplant the synthesized minimal genome into a bacterial cell with its old DNA removed; the resulting bacterium will be called Mycoplasma laboratorium. The next day the Canadian bioethics group, ETC Group issued a statement through their representative, Pat Mooney, saying Venter's "creation" was "a chassis on which you could build almost anything". The synthesized genome had not yet been transplanted into a working cell.[18]

On May 21, 2010, Science reported that the Venter group had successfully synthesized the genome of the bacterium Mycoplasma mycoides from a computer record, and transplanted the synthesized genome into the existing cell of a Mycoplasma capricolum bacterium that had had its DNA removed. The "synthetic" bacterium was viable, i.e. capable of replicating billions of times. The team had originally planned to use the M. genitalium bacterium they had previously been working with, but switched to M. mycoides because the latter bacterium grows much faster, which translated into quicker experiments.[19] Venter describes it as "the first species.... to have its parents be a computer".[20] The transformed bacterium is dubbed "Synthia" by ETC. A Venter spokesperson has declined to confirm any breakthrough at the time of this writing, likely because similar genetic introduction techniques such as transfection, transformation, transduction and protofection have been a standard research practice for many years. Now that the technique has been proven to work with the M. mycoides genome, the next project is presumably to go back to the minimized M. genitalium and transplant it into a cell to create the previously mentioned M. laboratorium.

See also

Notes

  1. ^ a b Khorana HG, Agarwal KL, Büchi H, et al. (December 1972). "Studies on polynucleotides. 103. Total synthesis of the structural gene for an alanine transfer ribonucleic acid from yeast". J. Mol. Biol. 72 (2): 209–217. doi:10.1016/0022-2836(72)90146-5. PMID 4571075. 
  2. ^ a b Itakura K, Hirose T, Crea R, et al. (December 1977). "Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin". Science 198 (4321): 1056–1063. doi:10.1126/science.412251. PMID 412251. http://www.sciencemag.org/cgi/pmidlookup?view=long&pmid=412251. 
  3. ^ a b Edge MD, Green AR, Heathcliffe GR, et al. (August 1981). "Total synthesis of a human leukocyte interferon gene". Nature 292 (5825): 756–62. doi:10.1038/292756a0. PMID 6167861. 
  4. ^ For example, the company DNA 2.0 was established in 2003 in Menlo Park, CA as a "synthetic genomics company" (quotated page).
  5. ^ "Difficult to Express Proteins". Sixth Annual PEGS Summit. Cambridge Healthtech Institute. 2010. http://www.pegsummit.com/dpx. Retrieved 11 May 2010. 
  6. ^ Liszewski, Kathy (1 May 2010), "New Tools Facilitate Protein Expression", Genetic Engineering & Biotechnology News, Bioprocessing (Mary Ann Liebert) 30 (9): 1, 40–41, http://www.genengnews.com/gen-articles/new-tools-facilitate-protein-expression/3273/, retrieved 11 May 2010 
  7. ^ Welch M, Govindarajan M, Ness JE, Villalobos A, Gurney A, Minshull J, Gustafsson C (2009). "Design Parameters to Control Synthetic Gene Expression in Escherichia coli". PLoS ONE 4 (9): e7002. doi:10.1371/journal.pone.0007002. PMC 2736378. PMID 19759823. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007002. 
  8. ^ "Protein Expression". DNA2.0. https://www.dna20.com/index.php?pageID=292. Retrieved 11 May 2010. 
  9. ^ a b Fuhrmann M, Oertel W, Hegemann P (August 1999). "A synthetic gene coding for the green fluorescent protein (GFP) is a versatile reporter in Chlamydomonas reinhardtii". Plant J. 19 (3): 353–61. doi:10.1046/j.1365-313X.1999.00526.x. PMID 10476082. http://www3.interscience.wiley.com/resolve/openurl?genre=article&sid=nlm:pubmed&issn=0960-7412&date=1999&volume=19&issue=3&spage=353. 
  10. ^ Mandecki W, Bolling TJ (August 1988). "FokI method of gene synthesis". Gene 68 (1): 101–7. doi:10.1016/0378-1119(88)90603-8. PMID 3265397. 
  11. ^ Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL (October 1995). "Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides". Gene 164 (1): 49–53. doi:10.1016/0378-1119(95)00511-4. PMID 7590320. http://linkinghub.elsevier.com/retrieve/pii/0378111995005114. 
  12. ^ Gao X, Yo P, Keith A, Ragan TJ, Harris TK (November 2003). "Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences". Nucleic Acids Res. 31 (22): e143. doi:10.1093/nar/gng143. PMC 275580. PMID 14602936. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=14602936. 
  13. ^ Young L, Dong Q (2004). "Two-step total gene synthesis method". Nucleic Acids Res. 32 (7): e59. doi:10.1093/nar/gnh058. PMC 407838. PMID 15087491. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15087491. 
  14. ^ Hoover DM, Lubkowski J (May 2002). "DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis". Nucleic Acids Res. 30 (10): e43. doi:10.1093/nar/30.10.e43. PMC 115297. PMID 12000848. http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=12000848. 
  15. ^ Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S (2006). "Gene Designer: a synthetic biology tool for constructing artificial DNA segments". BMC Bioinformatics 7: 285. doi:10.1186/1471-2105-7-285. PMC 1523223. PMID 16756672. http://www.biomedcentral.com/1471-2105/7/285. 
  16. ^ Tian J, Gong H, Sheng N, et al. (December 2004). "Accurate multiplex gene synthesis from programmable DNA microchips". Nature 432 (7020): 1050–4. doi:10.1038/nature0315110.1038/nature03151 (inactive 2009-12-20). PMID 15616567. 
  17. ^ "Genome Transplantation in Bacteria: Changing One Species to Another". Science. 2007-06-28. http://www.sciencemag.org/cgi/content/abstract/317/5838/632. Retrieved 2010-05-22. 
  18. ^ Pilkington, Ed (2009-10-06). "I am creating artificial life, declares US gene pioneer". London: The Guardian. http://www.guardian.co.uk/science/2007/oct/06/genetics.climatechange. Retrieved 2010-05-22. 
  19. ^ "Synthetic Genome Brings New Life to Bacterium". Science. 2010-05-21. http://www.sciencemag.org/cgi/reprint/328/5981/958.pdf. Retrieved 2010-05-21. 
  20. ^ "How scientists made 'artificial life'". BBC News. 2010-05-20. http://news.bbc.co.uk/1/hi/sci/tech/8695992.stm. Retrieved 2010-05-21. 

References

External links